Variance estimation and asymptotic confidence bands 
for the mean estimator of sampled functional data 
with high entropy unequal probability sampling 

designs 

Herve CARDOT (a) , Camelia GOGA (a) and Pauline LARDlN (a ' fe) 
(a) Universite de Bourgogne, Institut de Mathematiques de Bourgogne, 
9 av. Alain Savary, 21078 DIJON, FRANCE 
O (b) EDF, R&D, ICAME-SOAD, 1 av. du General de Gaulle, 

^ 92141 CLAMART, FRANCE 

October 16, 2012 
Abstract 

For fixed size sampling designs with high entropy it is well known that the variance 
of the Horvitz-Thompson estimator can be approximated by the Hajek formula. The 
^ interest of this asymptotic variance approximation is that it only involves the first order 

inclusion probabilities of the statistical units. We extend this variance formula when the 
variable under study is functional and we prove, under general conditions on the regular- 
ity of the individual trajectories and the sampling design, that it asymptotically provides 
a uniformly consistent estimator of the variance function of the Horvitz-Thompson es- 
timator of the mean function. Rates of convergence to the true variance function are 

• • 

^ given for rejective sampling. We deduce, under conditions on the entropy of the sampling 

•i-H 

^ design, that it is possible to build confidence bands whose coverage is asymptotically the 

desired one via simulation of Gaussian processes whose variance function is given by the 
Hajek formula. Finally, the accuracy of the proposed variance estimator is evaluated on 
samples of electricity consumption data measured every half an hour over a period of 
one week. 

Keywords : covariance function, finite population, Hajek approximation, Horvitz-Thompson 
estimator, Kullback-Leibler divergence, rejective sampling, unequal probability sampling 
without replacement. 
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1 Introduction 



Computing the variance of the Horvitz-Thompson estimator for unequal probability sam- 
pling designs can be difficult because the variance formula involves second order probability 



inclusions which are not always known. The Hajek variance formula, derived in Hajek (1964) 
for rejective sampling is an asymptotic approximation which only requires the knowledge of 



the first order inclusion probabilities and is easy to compute. It is shown in Hajek ( 1964 ) and 



Chen et al. (1994) that, for given first order inclusion probabilities, the rejective sampling 



is the fixed size sampling design with the highest entropy and the validity of this formula is 



closely related to the value of the entropy of the considered sampling design. Hajek (1981) 



proves that this approximation is also valid for the Sampford-Durbin sampling whereas [Berger 



(1998a) gives general conditions on the relative entropy of the sampling design, also called 
Kullback-Leibler divergence, which entail that the use of this approximated variance formula 
is justified. Variants and refinements of the Hajek variance formula as well as variance esti- 



mators are proposed in Deville and Tille (2005). Matei and Tille (2005) show on simulations 
that these approximations to the variance of Horvitz-Thompson estimators work well, even 
for moderate sample sizes, provided that the entropy of the underlying sampling design is high 



enough. Recently Deville and Tille (2005) and Fuller (2009) consider balanced, or approx- 
imately balanced, sampling algorithms which can be useful to build designs with fixed size 
and given inclusion probabilities by just balancing on the inclusion probabilities and relate 
these sampling designs with rejective sampling, so that the Hajek variance approximation 
remains valid. 

When the aim is to build confidence intervals, the asymptotic distribution of the Horvitz- 
Thompson estimator is required. The Central Limit Theorem has been checked by |Erd5s| 



and Renyi (1959) and Hajek (1960) for the simple random sampling without replacement, 



by Hajek (1964) for the rejective sampling and by Visek (1979) for the Sampford sampling. 



Berger (1998b) states that the Kullback Leibler divergence of the considered sampling design, 



with respect to the rejective sampling, should tend to zero when the sample size gets larger 
for the Horvitz-Thompson estimator to be asymptotically Gaussian. 

In recent studies in survey sampling the target was not a mean real value or a mean 



vector but a mean function (see Cardot and Josserand (2011) and Cardot et al. (2012b) for 
the estimation of electricity consumption curves) and one important issue was how to build 
confidence bands when using 7rps sampling designs. A rapid technique that is well adapted 



for large samples has been studied in Degras (2011) and Cardot et al. (2012a). It consists in 
first estimating the covariance function of the mean estimator and then simulating a Gaussian 
process, whose covariance function is the estimated covariance function, in order to determine 
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the distribution of its supremum. This strategy which has been employed successfully in 



Cardot et al. (2012b) to build confidence bands necessitates to have an effective estimator of 



the variance function of the Horvitz-Thompson estimator. The aim of this work is to prove 
under general assumptions, on the sampling design and on the regularity of the trajectories, 
that the Hajek formula provides a uniformly consistent estimator of the variance function so 
that it is possible to assess rigorously confidence bands built with the procedure described 
previously. 

The paper is organized as follows. The notations and our estimators are presented in 
Section 2. In Section 3, we state our main result, the uniform convergence of the estimated 
variance function under broad assumptions on the regularity of the trajectories and the sam- 
pling design. We deduce that if the Horvitz-Thompson estimator is pointwise asymptotically 
Gaussian, it also satisfies, under the same conditions, a functional central limit theorem 
and the confidence bands obtained by the Gaussian process simulation techniques have an 
asymptotic coverage which is the desired one. In section 4, we evaluate the performance of 
the covariance function estimator on samples drawn from a population of N = 15055 electric- 
ity consumption curves measured every half an hour over a period of one week. Note there 
are many ways of drawing samples with high entropy sampling distribution and given first 



order inclusion probabilities (see e.g. Brewer and Hanif (1983), Tille (2006), Bondesson et al. 
fl2006| ) and |Bondesson| fl2010p ) . Because of our large population and large sample context, we 
use the Cube algorithm (Deville and Tille ( 2004[ )) for which a very fast algorithm which can 



deal with populations of millions of units has been developed in Chauvet and Tille (2006). 
The proofs are gathered in an Appendix. 



2 Variance estimation and the Hajek formula 

Let us consider a finite population Un = {1, ...,N} of size N supposed to be known, and 
suppose that, for each unit k of the population Un, we can observe a deterministic curve 
Y/i = (}"fc(t))te[o,T]- We want to estimate the mean trajectory /ij^it), t G [0, T], defined as 
follows: 



(i) 



We consider a sample s, with fixed size n, drawn from f/jv according to a fixed-size 
sampling design Pn(s), where pn(s) is the probability of drawing the sample s. The mean 



curve /xjv(i) is estimated by the Horvitz-Thompson estimator (Cardot et al. (2010)) 

N ^ 



fces 



7T k 



N ^ 



Y k (t) 



N ^— ' 7T fc 



l fc) te[0,T], 



(2) 



3 



where 1^ is the sample membership indicator, Ife = 1 if A; G s and 1^ = otherwise. We 
denote by TTk = E p (lfc) the first order inclusion probability of unit k with respect to the 
sampling design Pn(s) and we suppose that ir^ > 0, for all units k in U. It is well known that, 
for each value of t G [0, T], ji(t) is a design-unbiased estimator of /ij^it), i.e. E p (/2(i)) = fj,isf(t). 
We denote by Tiki = E p (lfclj) the second order inclusion probabilities and we suppose that 
n k i > 0. 

Since the sample size is fixed, the variance j p (t, t) for each instant t of the estimator fl(t) 
is given by the Yates and Grundy formula (see |Yates and Grundy ( |1953 ) and Sen (1953)), 



7 P (M) 



2^2^ fa-™)^— — 



(3) 



and it is straightforward to express the covariance 7p(r, t) of /2 between two instants r and t, 
as follows 



1 1 



fa< 



yi(r)\ 



7T/ 



7T, 



(4) 



The variance formula ^ clearly indicates that if the first order inclusion probabilities are 
chosen to be approximately proportional to Yfc(i), the variance of the estimator ji(t) will be 
small. Thus, if we have an auxiliary variable X, whose value Xk, supposed to be positive, is 
known for all the units k £ U and if X is correlated with the variable of interest, it can be 
interesting to consider a sampling design whose first order inclusion probabilities are given 
by 



n- 



There are many ways of building sampling designs with given first order inclusion probabilities 



(see e.g Brewer and Hanif (1983) and Tille (2006)) and we focus here on the designs with 
high entropy, where the entropy of a sampling design (a discrete probability distribution 
on Un) is defined by 

H(pn) = ~ ^PtvO) hi(p N (s)) 



with the convention OlnO = 0. Chen et al. (1994) have proved that for given first order 



inclusion probabilities, the rejective sampling, or conditional Poisson sampling, is the fixed 
size sampling design with the highest entropy. Then, a key result is the following uniform 
approximation to the second order inclusion probabilities, 

(l-TTfcXl-TTj) 



^kl = ^k^l S 1 



d(vr) 



(5) 



where d(n) = Y2u^k(^ — TTfe) is supposed to tend to infinity. This approximation, which is 



satisfied for the rejective sampling and the Sampford-Durbin sampling (see Hajek (1981)) 
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appears to be very efficient when the sample size is large enough (and thus the value of d is 
also large) and the entropy of the sampling design is close to the maximum entropy. It also 
ensures that the variance estimator given below is always positive. 

By plugging in approximation ^ in Q, we obtain the Hajek approximation 7#(r, t) of 
the covariance cov(jl(t), (i(r)) in the functional case : for all (r, t) £ [0,T] 2 , 



1 

iV2 



E 



Y k (t)Y k {r) 

TT k 



U n v ' u u 

We consider in the following two variance estimators 



:i - **) - E D 1 - ^x 1 - "j) y *(W(r) 



(6) 



7H(r, t) 



1 d(ir) 
N^d^r) 



1 - 7T k 1 - 7T; 



7T/ 



(7) 



and 



T H (r,t) 



1 

JV2 



E 1 



7Tf 



-^EE 1 



TTfc 1 - VT/ 



7T( 



(8) 



where d(n) = Yl s (^-~ 7r k)- Note that A fH( r , t) is the functional analogue of the slightly modified 
variance estimator proposed by Berger (1998a) in the real case. More exactly, the variance 



estimator considered by Berger ( 1998a) is Jh{i~, t) multiplied by the correction factor n/(n— 1) 



that allows to obtain the exact expression for simple random sampling without replacement. 



The second estimator, 7#(r, t) is the extension to the functional case of the Deville and Tille 



(2005)'s estimator and it has been successfully used in a very recent study by Cardot et al. 



(2012b) to build confidence bands for the mean electricity consumption curve. 



We can easily show the following property. 

Proposition 2.1. //, for all t G [0, T], there is a constant ct such that Y k (t) = Ct~Kk then 
7jr(r, t) = and j H (r, t) = %(r, t) = 0. 

With real data, we do not observe Y k (t) at all instants t in [0,T] but only for a finite 
set of D measurement times, = t\ < ... < tp = T. In functional data analysis, when the 
noise level is low and the grid of discretization points is fine, it is usual to perform a linear 
interpolation or a smoothing of the discretized trajectories in order to obtain approximations 



of the trajectories at every instant t (cf. Ramsay and Silverman (2005)). When there are 



no measurement errors and when the trajectories are regular enough, Cardot and Josserand 



(2011) showed that linear interpolation can provide sufficiently accurate approximations of 



the trajectories. Thus, for each unit k in the sample s, we build the interpolated trajectory 

y fc (ti+i) - Y k (U) 



Y k , d (t) = Y k (U) + 



U+i — u 
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and define the estimator of the mean curve /izjv(i) based on the discretized observations as 
follows 

^) = 4E— > ^M m ]. (9) 

s 7Ffc 

Its covariance is then estimated by 



(10) 

and we show in the next section that it is an uniformly consistent estimator of the variance 
function. Replacing ifc(i) by Y^^t) in (j8j), yields the variance estimator ^hJj, t) based on 
discretized values. 

3 Asymptotic properties 

All the proof are postponed in an Appendix. 
3.1 Assumptions 

To demonstrate the asymptotic properties, we must suppose that the sample size and pop- 



ulation size become large. Therefore, we adopt the asymptotic approach of Hajek (1964), 
assuming that d(ir) — > oo. Note that this assumption implies that n — > oo and N — n — )• oo. 
We consider a sequence of growing and nested populations U n with size N tending to infinity 
and a sequence of samples sn of size un drawn from Un according to the sampling design 
Pn{sn)- The first and second order inclusion probabilities are respectively denoted by tt^n 
and TTklN- For simplicity of notations and when there is no ambiguity, we drop the subscript 
N. To prove our asymptotic results we need to introduce the following assumptions. 

Tl 

Al. We assume that lim — = tt elO, If. 

A2. We assume that minvr^ > A > 0, min7rjy > A* > and 
keu k^i 



vr^jl — [l + o(l)]j. 



A3. There are two positive constants C2 and C3 and ft > 1/2 such that, for all N and for 
all (r,t) 6 [0,T] x [0,T], 



^E^(°)) 2<C7 2 and ^^(Y k (t)-Y k (r)) 2 <C 3 \t 



N ^ K x " N 

kdU k£U 



r\ 20 . 



6 



A4. There are two positive constants C4 and C5 such that, for all N and for all (r, t) £ 
[0,T] x [0,T], 



l^(n(0)) 4 <C 4 and ^J2( Y k(t)-Yk(r)) 4 <C 5 \t 



r |4/3 

feet/ fcet/ 



A5. We assume that 

lim max |E p [(t klh - TT kl 7r h )(t k2 i 2 - ir k2 TT h )] | 

where 1^; is the sample membership of the couple (k, I) and D^n is the set of all distinct 
quadruples («]_, ...,£4) from C/jv- 

Assumptions Al and A2 are classical hypotheses in survey sampling and deal with the 
first and second order inclusion probabilities. They are satisfied for many usual sampling 
designs with fixed size (see for example Hajek ( 1981| ) ) . They directly imply that cn < d(-ir) < 



n, for some strictly positive constant c. The assumption A2 implies that lim sup n max \ir k i — 

7V-!.oo k ¥=l 

7rfc7T/| < C\ < 00. It also ensures that the Yates-Grundy variance estimator is always positive 
since ir k i < ir k TTi. 

Assumption A3 and A4 are regularity conditions on the individual trajectories. Even if 
point-wise consistency, for each fixed value of t, can be proved without any condition on /3, 
these regularity conditions are required to get uniform convergence of the mean estimator 



(see Cardot and Josserand (2011)). Note finally that assumption A5 is true for SRSWOR, 



stratified sampling and rejective sampling (Boistard et al. ( 2012| )). More generally, it also 



holds for unequal probability designs with large entropy as shown in the following proposition. 
Let us recall before the definition of the Kullback-Leibler divergence K(pN,Prej), 

K( PN , Prej ) = Y,Pn( s )^ (^\) , (11) 

s \Prej(S)J 

which measures how a sampling distribution Pn(s) is distant from a reference sampling 
distribution, chosen here to be the rejective sampling p re j(s) since it is the design with 
maximum entropy for given first order inclusion probabilities. We can now state the following 
proposition which gives an upper bound for the rates of convergence to zero of the quantity 
in A4 in terms of Kullback-Leibler divergence with respect to the rejective sampling. We 
consider a sampling design p^ with the same first order inclusion probabilities as p re j. We 
have 

Proposition 3.1. If d{-K) — > 00, then 

C I 

, „ lEpPWi - 7r fe 1 vri 1 )(lfc 2 / 2 -ir k2 TT h )]\ < —— + JK(p N ,p rej )/2 

for some constant C. 
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A direct consequence of Proposition 3.1 is that assumption A5 is satisfied for rejective 
sampling as well as for the Sampford-Durbin design, whose Kullback-Leibler divergence, with 
respect to the rejective sampling, tends to zero as the sample size n tends to infinity (see 



Berger (1998b)). Note also that the Kullback-Leibler divergence has been approximated 



asymptotically for other sampling designs such as Pareto sampling in Lundqvist (2007) 



3.2 Convergence of the estimated variance 



Let us first recall Proposition 3.3 in Cardot and Josserand (2011) which states that the 



estimator jX^ is asymptotically design unbiased and uniform consistent under mild assump- 
tion. More precisely, if assumptions (A1)-(A3) hold and if the discretization scheme satisfies 
limjv->oo max i={i,..,d JV -i} ~~ ^i| 2/3 = °( n l )i then for some constant C 

y/nE p < sup \jld{t) - a*jv(*)| \ < C. 
[te[o,T] J 

We can now state our first result which indicates that jH,d is consistent pointwise and 
the variance function estimator of the estimated mean trajectory is uniformly consistent. 
Note that additional assumptions on the sampling design are required to obtained the rate 
of convergence. 

Proposition 3.2. Assume (A1)-(A5) hold and the sequence of discretization schemes satis- 
fies limj\r->oo max i={i,. .,d N -i\ \ti+i — U\ = When N tends to infinity, 



nE p {\^ H4 {r,t)- lp {r,t) |} 



for all (r,t) £ [0,T] 2 and 



nE p < sup | 7ff )d (£, t) - 7 p (t, t) 
\te[o,T] 



We can state the same result for the second variance estimator, "f* H d - 

Proposition 3.3. Assume (A1)-(A5) hold and the sequence of discretization schemes satis- 
fies limjv^oo maxi = /i djv— 1} l^i+l ~~ *t| = °iX)- When N tends to infinity, 



n. 



® P {\%,d(r,t)-7 P (r,t)\}^0 



for all (r,t) G [0,T] 2 and 



nE p < sup | Tff d (t, t) - 7 p (i, t) \ > 0. 
\te[o,T] I 
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A sharper result can be stated for the particular case of rejective sampling for which 



accurate approximations to the multiple inclusion probabilities are available (see Boistard 



et al. (2012)) 



Proposition 3.4. Suppose that the sample is selected with the rejective sampling design. 
Assume (Al)-(A^) hold and the sequence of discretization schemes satisfies 
HniTv^oo max i= {i i .. ](ijv _ 1 } \t i+1 - U\ 213 = 0{n~ l ). Then, for all (r,t) £ [0,T] 2 



n 3 E r 



(7H,d(r,t) -j p (r,t)Y 



< C 



for some positive constant C. 



We can note in the proof, given in the Appendix, that the approximation error to the true 
variance by the Hajek formula is asymptotically negligible compared to the sampling error. 



3.3 Asymptotic normality and confidence bands 

Let us assume that the Horvitz-thompson estimator satisfies a Central Limit Theorem for 
real valued quantities with new moment conditions 

A6. There is some 5 > 0, such that N~ l J2keU I 1 2+4 < 00 for a11 t e [°' T ]' and 



{j p (t, t)} * {fi(t) — [J,(t)} — > M(0, 1) in distribution when N tends to infinity. 



Cardot and Josserand (2011 ) have shown that under the previous assumptions, the central 



limit theorem also holds in the space of continuous functions. More precisely, if assumptions 
(A1)-(A3) and (A6) hold and the discretization points satisfy 
limTv^oo max i= { li ,, ]djv _ 1 } \t i+1 - ti\ 2/3 = o(n~ 1 ), we have 

y/n{(jLd — fJ>) — > Z in distribution in C[0,T] 

where Z is a Gaussian random function taking values in C[0, T] with mean and covariance 
function 72 (r, t) = lim7v-s>oo n lp N ( r , t) ■ This important result gives a theoretical justifica- 
tion of the confidence bands built as follows. We examine now the asymptotic coverage of 
confidence bands for /xjy of the form 

d(ty 



fl d {t) ± c 



,te[0,T] 
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where c is a suitable number and a(t) = yj rijH t d(t, t). 

Given a confidence level 1 — a G]0, 1[, one way to build such confidence bands, that 
is to say one way to find an adequate value for c a , is to perform simulations of centered 
Gaussian functions Z defined on [0,T] with mean and covariance function rv)H,d(Ti't) an d 
then compute the quantile of order 1 — a of sup tg r 0i T] Z{t)/a{t) . In other words, we look for 
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a constant c a , which is in fact a random variable since it depends on the estimated covariance 
function 7m a d, such that 



P (\Z(t)\ < c, 



a(t) 



n 



We [0,T] \j H4 =1 -a 



(13) 



Next proposition provides a rigorous justification of this latter technique : 



Proposition 3.5. Assume (Al)-(A6) hold and the discretization scheme satisfies 
limTv^oo max i={lj .. )djv _ 1 j. \t i+1 - ti\ 2/3 = o(n _1 ). 

Let Z be a Gaussian process with mean zero and covariance function jz- Let (Zn) be 



a sequence of processes such that for each N, conditionally on ^H,d defined in (10), Zn is 



Gaussian with mean zero and covariance njH,d- Then for all c > 0, as N —> oo, the following 
convergence holds in probability: 



F[\Z N (t)\<ca(t), Vie [0,T]\j H>d j ^F(\Z(t)\<ca(t), Vi e [0,T]), 
where a(t) = y/n/yjj t( i(t, t) and a(t) = \f~fz{t, t). 



The proof of Proposition 3.5 is very similar to the proof of Proposition 3.5 in Cardot 
et al. (2012c) and is thus omitted. As in Cardot et al. ( |2012a ), it is possible to deduce from 
previous proposition that the chosen value c a = c a i^)H,d) provides asymptotically the desired 
coverage since it satisfies 

a{t) 



lim P u(t) e 



Vt G [0, T] 



1 — a. 



4 Example: variance estimation for electricity consumption 
curves 

In this section, we evaluate the performance of the estimators Jjf ^r, t) and A fH,d( r it) of the 
functional variance 7 p (r, t) of jjLd{~t)- Simulation studies not reported here showed that the 
estimators 7^ d (r, t) and 7H,d(r, t) conduct very similarly asymptotically. This is why we only 
give below the simulation results for jH,d( r ,t). 



We use the same data frame as in Cardot et al. (2012b). More exactly, we have a 
population U of iV = 15055 electricity consumption curves measured every half an hour 
during one week, so that there are T> = 336 time points. The mean consumption during the 
previous week for each meter k, denoted Xk, is used as an auxiliary variable. This variable is 
strongly correlated to the consumption curve Yfc(i) (the pointwise correlation is always larger 
than 0.80) and is inexpensive to transmit. 

We select samples s of size n drawn with inclusion probabilities irk proportional to the past 
mean electricity consumption. This means that tt^ = n 



-. As mentioned in 



Deville and 
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Tille ( 2005 ) , this kind of sampling may be viewed as a balanced sampling with the balancing 



variable ir = 7Tjv)- The sample was drawn using the fast version (see Chauvet and 



Tille (2006)) of the cube algorithm (see Deville and Tille (2004)). As suggested in Chauvet 



(2007), a random sort of the population is made before the sample selection. The true mean 
consumption curve observed in the population U and one estimation obtained from a sample 
s' of size n = 1500 are drawn in Figure [T] 

The inclusion probabilities ttm being unknown, an empirical estimation of the covariance 
function 7 p is given, from J = 10000 simulations, by 

1 J 

lem P {r, t) = j—_ - 5^(Acy(*) - ii d (t))(jl d j(r) - fj, d (r)) (14) 



with fl dij (t) = jjEk &S] ¥ ^r 1 ^ Mt) = lE/^AdjW and (r,t) € [0,T]. The empirical 
variance function 7 emp (solid line) of estimator pL d , the Hajek approximation (dotted 
line) and one estimation jn,d (dashed line) obtained from the same sample s' are drawn in 
Figure [2] 




Figure 1: Mean consumption curve and its Horvitz- Thompson estimation obtained from 
sample s' , with n = 1500. 

To evaluate the performance of estimator jH,d, we consider different sample sizes, n = 250, 
n = 500 and n = 1500. The corresponding values of d(n) are d(iv) = 241.2, d(n) = 464.7 and 
d(ir) = 1202.3. 
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Hours 



Figure 2: Empirical variance j emp (solid line), Hajek's approximation (dotted line) and 
variance estimation jH,d (dashed line) obtained from sample s' , with n = 1500. 

For each sample size, we draw / = 10000 samples and we compute the following quadratic 
loss criterion 

|7H(t<d,*d) - 7emp(id,*d)| 2 



2 



* I l^^^- W^)! 2 ^ (is) 

J Tempi*)'' J 

We also compute the relative mean squared error, 

i 1 

RMSE=- i Y,R (t) (lH 4 ) 

= RB(j Hjd ) 2 + KV(7H,d), (16) 

where R^\jH,d) is the value of R(jH,d) computed for the zth simulation; RB(jH,d) an d 
RV(lH,d) are the relative bias, respectively the relative variance of estimator jn,d- Note that 

RR{iH,d) 2 is s iven b y 

Kflhw) = 5 £ ( J 

where Jn^A, *d) = Si=i Tff d(*d> td)/^ an d 7#<i(£d> *d) is the variance estimation obtained 
with the ith simulation. 
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Sample Size 


RMSE 


RB(j H ,d) 2 




5% 


I s * quartile 


median 


3 rd quartile 


95% 


250 


0.9473 


0.0004 


0.0188 


0.0298 


0.0446 


0.0748 


0.4326 


500 


0.3428 


0.0002 


0.0121 


0.0191 


0.0278 


0.0456 


0.3510 


1500 


0.1406 


0.0003 


0.006 


0.0097 


0.0144 


0.0272 


0.0929 



Table 1: RMSE, RB(j H4 ) 2 and estimation errors according to criterion R(jH,d) for different 
sample sizes, with I = 10000 simulations. 



The estimation errors are presented in Table [T] for the three considered sample sizes. We 
first note that the values of relative bias RB^h^) are very low, meaning that the Hajek's 
formula provides, in our relatively large sample context, a very good approximation to the 
variance. The median error for R(fyHd) is slightly larger but remains small (always less than 
5%), even for moderate sample sizes (n=250). This means that the most important part of 
the variance estimation error is due to the sampling error. We have drawn in Figure [3] the 
approximation error 7emp(^ r ) — lH,d{t, r ) aiK1 m Figure [4] the estimation error r y e mp{t-, r ) — 
lH,d{t, t) for t,r G {1, . . . , D}, corresponding to a sample of size n = 1500 with an estimation 
error close to the median value of the global risk, R(lH,d) = 0.0144. It appears that the 
largest estimation errors for the variance occur when the level of consumption is high. We 
can also observe in these Figures a kind of periodic pattern which can be related to the daily 
electricity consumption behavior. 

Nevertheless, we also note that the relative mean squared error RAISE, which is approxi- 
mately equal to the relative variance of the estimator jH,d, is rather high, especially for small 
sample sizes (n = 250). Looking at the 95 % quantiles of R(ffH,d) m Table [TJ we can deduce 
that bad variance estimations only occur in rare cases but with very large errors. We can 
note in Figure [5] which represents the distribution of the sampling weights at a logarithmic 
scale, that there are many large outlying values, especially when the sample size is not very 
large. The bad performance of the variance estimator, in terms of RMSE, is in fact due to a 
few individuals in the population that have both a very small inclusion probability Hk and a 
consumption level Yj~ that can be very high at some instants of the period. Their selection 
in the sample, which occurs rarely, leads to an overestimation of the mean curve and to a 
large error R{^H,d) when estimating the variance at these instants. One possible way to deal 
with this issue and that will be explored in a another work would consist in correcting the 



sampling weights of the most influential units of the sample (see e.g Beaumont and Rivest 



(2009)) in order to get a more stable variance estimator. 
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Figure 3: Approximation error 7 emp — jn.d f° r a sample of size n = 1500. 
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Figure 4: Estimation error 7 emp — Jh^ for a a sample of size n = 1500. 
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Figure 5: Boxplot of log(l/iTk) for different sample sizes, n = 250,500 and 1500. 

A Proofs 

Throughout the proofs we use the letter C to denote a generic constant whose value may 
vary from place to place. Let us also define A k i = n k i — ir k TTi and A kk = ir k (l — 7r^) . 

A.l Some useful lemmas 

Lemma A.l. Assume (A4) hold. There is a constant £i such that 

^\Mt,r)\ 2 <ci\t-r\ 2 ^ 



keU 



where <t> k , k {t,r) = Y k (t)Y k {t) - Y k {r)Y k (r). 
Proof. We have 

1 £ \4> k , k (t, r)\ 2 < | \Y, \Yk(t) - Y k (r)\ 2 \Y k (t)\ 2 + £ \Y k (t) - Y k (r)\ 2 \Y k (r)f 



keu 



< 2 



1/2 



1/2 



1/2- 
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Under assumption (A4), we get that, for some constant £i 



^^|<M*,OI 2 <Cil* 



N 

keu 



rl 2 ' 9 . 



□ 



Lemma A. 2. Assume (A3) hold. There is a constant £2 such that 

2 

<(2\t-r\ 



V keui&u / 



1 2/3 



where <f) k ,t(t,r) = Y k (t)Yi(t) - Y k {r)Yi(r). 



Proof. The demonstration is similar to the proof of Lemma A.l and is thus omitted. □ 



Lemma A. 3. Assume (Al) and (A2) hold. 



Ep((d(7r) - d(ir)) 2 ) < 



1 maxfc^j |A W | 



+ 



A 2 



d(vr) d(n) 



Proof. Under assumptions (Al) and (A2), 

E p (( ( i(vr)- ( i(vr)) 2 ) = EE^"™ 



u u 



7Tfe(l - 7Tfc)7r Z (l - TTl) 



< (\ + m8Jit A 1 |AMl W) i«. 



□ 



A. 2 Proof of proposition 3.1 



We first consider the case of the rejective sampling p re j(s) and show that A5 is true if d(7Tjv) 



tends to infinity. By Theorem 1 in in Boistard et al. (2012) and hypothesis A2, we have 



uniformly for (fci, ii, fe 2 , ^2) G Al,7V- Since 7r fcl 7r fc2 - 7r fclA . 2 = 0(d(7r) _1 ) and 7^71-/2 - 7r W2 = 
0{d{'K)~ 1 ) uniformly for (fej, ii, &2, £2) G D^^, we directly obtain that, for rejective sampling 

C 

max |E p [(Ijfeiii - 7Tfc 1 ^i)(lfc 2 i 2 - ^2^2)] I < 

(fel,il,fc2,'2)G-D4,JV 

for some constant C. 



d(TT) 
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If we consider now a different sampling design Pn(s), we have with Pinsker inequality 



(see Theorem 6.1 in Kemperman (1969)) and the property of the total variation distance, 



sup \pn(A) -p rej (A)\ < J K(p N ,p rej )/2 
AeA M v 

where Aj\f is the set of all partitions of Ufj. Considering the particular cases A = {(ki, l\, /C2, h) £ 

D 4tN }, and denoting by n kl k 2 hh = p N {A) and by ^ kihh = Prej(A), we directly get that 



sup 

(fcl,/l,fc2,/2)6D4 i jv 



< JK(p N ,p rej )/2 



and the proof is complete. 



A. 3 Proof of Proposition 3.2| (consistency of the covariance and the vari- 
ance functions) 



The proof follows the same steps as in Cardot et al. (2012c). We show first that for all 



t,r S [0, T], the estimator of the covariance function jH,d( r ,t) is consistent for r y p (r,t) and 
then, that the random variable n(^fH,d{t,t) — 7 p (i,i)) converges in distribution to zero in 
the space C([0,T]). By definition of the convergence in distribution in C([0,T]) and the 
boundedness and continuity of the sup functional, we then directly obtain the announced 



result. As in Cardot et al. (2012c), in order to obtain the convergence in distribution of 



n(~fH,d{t,~t) — 7p(^^))) we first show the pointwise convergence, which clearly implies the 
convergence of all finite linear combinations, and then check that the sequence is tight. 

Step 1. Pointwise convergence 

We want to show, that for each (t,r) £ [0,T] 2 , we have 

nE p {\ 7H, d (r, t) - j p (r, t) |} -)■ 0, when iV — )• oo. 

Let us decompose 

n(ln,d(r, t) - 7 P (r, t)) = n(% :d (r, t) - %(r, t)) + n(%(r, t) - -y p (r, t)) 
and study separately the interpolation and the estimation errors. 

Interpolation error 

Let us suppose that t G r G , tj'-j-i [ and bound 

n|W(r,i)-7tf(r,*)| < ^jf\Yl ^^I^mW^mW " Y k (t)Y k {r)\ 

W u ^k 



N 2 d(ir) 77/ 
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Let us define M = mm.ir k . Noting that d(ir) < jd(n), < N x(i~m) an< ^ 
\Y k , d (t)Y l>d (r) - lfe(t)F z (r)| < \Y k (U) - Y k {t)\\Yi{U,)\ + Wv) ~ ^(011^(^)1 

+ \Y k (t i+1 ) - Y k (u)\ m(t il+1 )\ + m^w + lYMWYtfo+i) - Yi(t 

we can bound, under assumptions (A1)-(A4), 

n\7 H , d (r,t) - j H (r,t)\ < (~ + ^\ ^[\U - tf + \t v - rf + \t v+1 - t v f + 3|t i+1 - uf] 

^ 2 (^^ + ^))^ [|tm " t/ + |tl ' +1 "^ n - (17) 
Thus, under the assumption on the grid of discretization points, 

n\ A /H,d(. r ^ t ) ~ lH(r,t)\ = o(l). 
Consider now the following decomposition 

I lH(r, t) - 7 P (r, t) I <| j H (r, t) - lH (r, t) \ + \ j H (r, t) - 7p (r, t) | (18) 
and study separately these two types of error. 

Approximation error 

We first show that, for each (r, t) G [0,T] 2 , 

n I jH(r,t) - j p (r,t) \ = o(l). 
By introducing approximation ^ 

(1 — 7Tfe)(l — C k l 

KM - TT k TTl = -TT k 7Tl — — h — — (19 j 

d{ir) a{TT) 



where max/^/ \ c k i\ — > 0, in the covariance function Q, we get 

r 1 1 x-x- c kl (Y k {r) ^(r)Wy fc (t) i^t)\ 

Thus, we directly get with assumptions (A1)-(A3) that 

d(n) \ lH {r,t)- lp {r,t)\=o{l). (20) 
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Sampling error 



To establish the convergence of n(7#(r, t) — 7jy(r, t)) to zero in probability as A 
enough to show that, for all (r, t) S [0,T] 2 



oo, it is 



n 2 E p [(j H (r,t)- lH (r,t)f 



0, when N — > oo. 



Noting that 



+ 



+ 



n 

iV2 



E 



u 



d(n) 



1 



TT7 



(1 - 7r k )Y k (t)Y k (r) 



N 2 
n 1 



EE 



1 ~ TTfc 
Ifci 



(l-TTfcJCl-TrOnC*)^^ 



(21) 



we get 



n 2 E p [(7 H (r, t) - 7ff (r, i)) 2 ] < 3Ep(Bi(r, t) 2 ) + 3E p {B 2 {r, t) 2 ) + 3E p {B 3 (r, t) 2 ). (22) 
Let us show now that K p (Bi(r, t) 2 ) — > when A — > oo. Let us define M = max7Tfc. Under 



assumptions (Al), (A2) and (A4) and using lemma A. 3 and the inequality ^Ey < N x(i-M) ' 
we have 



Ep(Bi(r,t) 2 ) < 



n 2 1 

d(^) 2 A 2 p 



(d(Tr) - d(vr)) 2 



J-^|y fc (t)| 2 |y fc (r)p 



A 4 A 



< 



«(l-M)F + B ^ |At,| M. 



- 1 
A 6 A 



1/2 



Ei^)i 4 4£iwi 

(7 / V U 



< —C 

~ A 



so that E p (Bi{r,t) 2 ) -> when A -> oo. 



Under assumptions (Al), (A2) and (A4), we can bound 



«,*) 2 )<pEE 



|Ajw| 1 - 7T fc 1 - TTi 



A 4 *^ ^ 7r fc 7T/ 7Tfc 7T/ 



|y fc (t)y fc (r)^(t)^(r)| 



< 



1 1 / n 2 n 2 max fc ^ jfcj / e( y \A M \\ ( 1 



A 3 A \ A 2 
1 



AA 



)(*£i*i«i 4 )"Y*£in( 



1/2 



< — c 

- A 
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so that E p (f?2(r, t) 2 ) — > when N — > oo. For the third term, we have 



E p (B 3 (r,t) 2 ) = n 2 E p 



1 1 



iV 4 d(ir) 



E E 



■^-^ V VT1.7T/ 

k,ieuk',i'eu v K ' 
.(1 - 7T fc )(l - tt,)(1 - 7r fc 0(l - 7r r )y fc (t)FKr)n'(*)^'(^ 



< 



n 2 1 



EE 



iV 4 d(7T) 2 

2n 2 1 x ^ x ^ 

f^r^E E 



E, 



7T7 



- 1 



4 



1 



iV 4 d(7r) 2 



keuk'&'eu 



E, 



7T 



+ 



n 



TV 4 



— V V E [fi*L-iVi*^_-i 



\Y k (t)\\Y k {r)\\Y k ,(t)\\Y k ,(r)\ 

\Y k (t)\\Y k (r)\\Y k ,{t)\\Y v (r)\ 

\Y k (t)\\Yl(r)\\Y kl (t)\\Yv(r)\ 



■ = Vl + v 2 + v s 

Under assumptions (Al), (A2) and (A4) and the inequality ir k i < TT k TTi, we get 

,2 



Vl < 



n 



+ 



< 



NH{n) 2 
n 2 1 

iV 4 d(TT) 2 

n 2 1 
Wd^ 2 



E p 

















E 

feet/ 



E E 



E, 



ln(i)| 2 |E fe (r)| 2 



+ 1 



1/1 2 A / 1 2 1 
iV U + A + 1 + U + A + 1 



|y fc (t)lin(r)i|y fc ,(t)lin'Wi 

A 1/2 / 



N 



keu 



k&U 



1/2 



(23) 



Since d(ir) — >■ oo when JV-)> oo we have that ui — > 0. Under assumptions (Al), (A2), (A4) 
and (A5) 

V3 -Jt + N^dUv h (k i y^D |Ep [{lkl ~ * kiri) {lh ' 1 ' ~ S l^(*)H^ WII^VWII^ (r) 

,JV (k,l,k',l')€D 4iN 



C n 2 



Jet/ 
(24) 



Hence t> 3 — >■ when iV — >■ 00. By the Cauchy-Schwarz inequality, we have V2 — > when 
TV — >■ 00. Finally, we have for all (r, i) G [0, T] 2 , 



nE p (\j H (r,t) - j p (r,t)\) -)• 0, when TV -> 00. 



(25) 



and consequently, 



n 



Ep{| lH,d(r,t) - 7 P (r,t) |} -»• 0, when TV -»• 00. 
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Step 2. Tightness 

To check the tightness of n(7#(t, t) — 7#(i, t)) in C[0,T], we use the Theorem 12.3 from 



Billingsley (1968). Since the pointwise consistency of n^u — 1h) implies that n(7#(0,0) 
7#(0,0)) is tight, to get the announced result, it remains to study the increments of n^u 
7#) between two instants t and r. Considering 

d 2 Jt,r) = n 2 E p {\j H (t,t) -7ff(M) ~lH{r,r) + lH (r,r)\ 2 ) 



we only need to prove that 



d 2 Jt,r) < C\t-r\ 215 , 



for some positive constant C and all (r, t) G [0,T] 2 . Since /3 > 1/2, the above inequality 
implies that the sequence n(^H — 7#) is tight in C([0,T]). 



Using (21), we can decompose d 2 {t,r) into 3 parts, 



4(r, t) < 2,[E P {\B x {t, t) - Bt(r, r)} 2 ) + E p ([B 2 (t, t) - B 2 {r, r)} 2 ) 
+ E p ([B 3 (t,t)-B 3 (r,r)}^ 

:=3(4 1 +4 2 +4 3 )- 



Using Lemma A.l and Assumptions (A1)-(A2), we obtain 



2 _ 


n 2 ^ 
A 4 p 


< 


n 2 (I 


A 2 " \\ 


< 


n 2 fl 


A 2 " \X 


< 


C\t-r 



J7~~\ ~ — ~ — 9k,k\t,r) 



d(vr) 
A 4 I A 



A 4 



Ci|t-r 



2d 



2/3 



(26) 



With assumptions (Al), (A2) and (A6) and Lemma A.l we get 

. N 

1 2/3 



d 4 < 1 

2 A^ V A 3 



/?- n 2 max fc ^|Afcz|\ 1 ^, , ms 

+ ^ I — 2^ I0fc,*(*> r )l 

u 



< C\t 



(27) 
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Finally, 



n 2 1 



Bs ~ N 4 d(ir) 2 



+ 



+ 



2n z 1 
W d(7r) 

n 2 1 



EE 

k k> 

,EE 

k k'^V 

EE 



E, 



k^l k'^V 
:= bi+b 2 + b 3 . 



E, 



Ife 



*2 

Ifcj 



1 



A'/' 



|0fc,fc(t,r)||0fc/ i fe/(t,r)| 
|0JMX*> r )ll0*',«'(*> r )l 
|0fc,K*> r )ll0fc',J'(*>»')l 



Thanks to Lemma A.l and under assumptions (A1),(A2) and (A4) we get 

n 2 1 T 1 / 1 2 A / 1 2 ,\1 1 On , v,' 



6i < 



TV 2 d(yr) 2 
<C|t-r| 2 1 



1/1 2 \ / 1 2 
ivU + A + 1 + U + A + 1 



(28) 



Under assumptions (A1),(A2), (A4) and (A5) and using Lemma A. 2 we have 



C\t-r\ 2 P n 



N ' d 2 (vr)A 4 (k 
<C\t-r\ 2 ?. 



( i V 

max \E p [(l k i-TT k Tri)(I k >i> -7Tfc/7rj/)]| ^ V" |0w(t,r)| 
,l,k>,l')£D 4)N y \ kl J 



(29) 



Using the Cauchy-Schwarz inequality together with bounds (28) and (29), we get that 62 < 
C|t - r| 2/3 so that 

(30) 



d% a <C\t-r\^. 



Finally, we deduce, with inequalities (26), u27« and (30) that 



d 2 (r,t) <C\t-r\ 2 ?. 



(31) 



A. 4 Proof of Proposition 3.3 



Under assumptions (Al) and (A2), it is clear that d(ir)/d(ir) = 1 + o p (l). The pointwise 
convergence of n^* Hd {r,t) is then a direct consequence of Proposition 



T H Ar,t) 



diTT] 

d« 



3.2 



and the fact that 



jH,d{f, t). Furthermore, we may write 



/«* \ d(n),^ . 



d(n) ' \d(ir) 



^-1 



1H- 



By Slutsky's theorem, the first term at the righthand-side of previous equation converges 
in distribution to zero in C([0, T]) while the second term goes to zero in probability since 
su P(r,t)e[0,T] 2 l n 7-ff( r > t)\ < 00 an d jjS ~~ 1 = °p(1)- Hence, the sequence n{^* H d — 7h) converges 
in distribution to zero in C([0, T]). 
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A. 5 Proof of Proposition |3.4 

We first note that the interpolation error, bounded in (17), satisfies 



n 3/2 |7H,d(r,t)-7H(r,t)| = O(l) (32) 
provided that limjv-^oo max i={i,..,d ]V -i} ~~ ti\ 2 ^ = 0(^ -1 )- We then use the fact (see The- 



orem 1 in Boistard et al. (2012)) that for rejective sampling the terms Cki defined in (19) 



satisfy, for some constant C, 



Thus, bound (20) is now 



max|cjy| < Cd{ir) . 
k,i 



d(n) 2 \ lH (r,t)- lp (r,t)\ = 0(l). 



(33) 



If we examine now the sampling error, we can check that the terms B\ and B2 are of order 



-1 



Concerning the term £>3, it is bounded by the sum v\ + V2 + V3 with v\ = 0(d (vr)) 



and V2 < ^v\V3. Thanks to Proposition 3.1 we get that the term V3 satisfies V3 = 0(d 1 (vr)) 
and consequently, E p (i?3(r, t) 2 ) = 0(n _1 ). Thus, 

n 2 E p [^ H (r,t)- lH (r,t)) 2 ] = Ofn" 1 ) 



and the proof is complete. 
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